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1 Introduction 

Many theoretical extensions of the standard model (SM) predict the existence of color-triplet 
scalar or vector bosons, called leptoquarks (LQ), that have fractional electric charge and both 
lepton and baryon quantum numbers. These theories include grand unified theories [IJ, com- 
posite models [2, 3J, technicolor schemes liUSJ, and superstring-inspired E(, models [7J. We 
follow the usual assumption that there are three generations of LQs, each of which couples 
only to the corresponding generation of SM particles, to avoid violating the known experi- 
mental constraints on flavor-changing neutral currents [8]. Leptoquarks would be produced at 
the Large Hadron Collider (LHC) in pairs predominantly through gg fusion and qq annihila- 
tion, and the contributions from lepton i-channel exchange are suppressed by the leptoquark 
Yukawa couplings. A leptoquark decays to a charged lepton and a quark with a branching 
fraction /6 usually considered as a free parameter of the model, or a neutrino and a quark with 
branching fraction 1 — /3. For scalar LQs, the production cross section is determined by the 
ordinary color coupling between an LQ and a gluon, which is model independent. 

Numerous theories of particle physics beyond the SM address the gauge hierarchy problem 
and other shortcomings of the SM by introducing a new symmetry that relates fermions and 
bosons, called "super symmetry" (SUSY) [9]. Supersymmetric models introduce a new discrete 
symmetry, R-parity, and all SM particles have Rp = +1 while all superpartners have Rp = 
—1. Imposing R-parity conservation prohibits baryon and lepton number violating couplings 
which could otherwise lead to rapid proton decay. In models with R-parity conservation, SUSY 
particles are produced in pairs, and the lightest SUSY particle (LSP) is stable. In some models 
the LSP is the electrically neutral and weakly interacting neutralino (^), which provides a 
dark matter candidate [10 |. The left- and right-handed SM quarks have scalar partners (q^ and 
qs) that can mix to form scalar quarks (squarks) with mass eigenstates qi 2. Since the mixing 
is proportional to the corresponding SM fermion masses, the effects can be enhanced for the 
third generation squarks, yielding sbottom (61,2) and stop (ti,2) mass eigenstates with large 
mass splitting. The lighter mass eigenstate (bi or ti) could be lighter than any other charged 
SUSY particle fH} . Therefore, if sufficiently light, bi squarks could be produced at the LHC 
either directly or through decays of gluinos (the supersymmetric partners of gluons). In most 
SUSY models, a bi is expected to decay predominantly into a bottom quark and so that 
the final state consists of b jets and a sizable imbalance in transverse energy (£t)/ defined as 
the magnitude of the vector opposite to the sum of the transverse momenta of all detected 
particles. 

In this paper we present results of a search for pair-produced scalar third-generation lepto- 
quarks (LQ3) with an electric charge of ±1/3 and for bi. Each of the LQ3 (bi) particles decays 
into a b quark and v-^ (x^)- In each case, signal events are characterized by two high-transverse- 
momentum (px) b jets accompanied by large £7- The resulting final state, consisting of jets, ^j, 
and no charged leptons, does not allow a full reconstruction of the decay chain, because of the 
lack of knowledge of the individual momenta of the weakly interacting particles. 

Previous searches performed by the CDF and DO collaborations at the Tevatron have excluded 
LQ3 1/^-5 masses below 247 GeV, and set limits on the production of bi squarks for a range of 
values in the bi - ^ mass plane that extend up to m(bi) = 200 GeV for m(^) = llOGeV HI 
[131 . A search performed by the CMS collaboration has excluded the existence of a scalar LQ3 
with an electric charge of ±2/3 or ±4/ 3 and with mass below 525 GeV, assuming 100% branch- 
ing fraction to a b quark and a T lepton |[T4| . A search performed by the ATLAS collaboration 
excluded the production of bi with masses up to 390 GeV, for ^ masses below 60 GeV [il5il . 
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3 Razor variables 



The main SM backgrounds in this search are tt+jets, heavy-flavor (HF) multijet production, 
and W or Z accompanied by HF production. In the case of multijet events and W/Z decays 
to hadrons, the is due to neutrinos in HF semileptonic decays, and due to effects of jet 
energy resolution and mismeasurements. In the case of W/Z decays to leptons, genuine 
results from the escaping neutrinos when the charged lepton (e or ^) goes undetected, or from 
T decays. 



2 The CMS apparatus 

A detailed description of the Compact Muon Solenoid (CMS) detector can be found else- 
where IIT6ll . The central feature of the CMS detector is the superconducting solenoid magnet, of 
6 m internal diameter, providing a magnetic field of 3.8 T. The silicon pixel and strip tracker, the 
lead-tungstate crystal electromagnetic calorimeter (ECAL), and the brass /scintillator hadron 
calorimeter (HCAL) are contained within the solenoid. Muons are detected in gas-ionization 
chambers embedded in the steel return yoke. The ECAL has a typical energy resolution of 1- 
2% for electrons and photons above 100 GeV. The HCAL, combined with the ECAL, measures 
the jet energy with a resolution AE/E ^ 100% /^/E/ GeV 5%. 

CMS uses a right-handed coordinate system, with the origin located at the nominal collision 
point, the x axis pointing towards the center of the LHC ring, the y axis pointing up (perpendic- 
ular to the plane of LHC ring), and the z axis along the counterclockwise-beam direction. The 
azimuthal angle (p is measured with respect to the x axis in the x-y plane and the polar angle 9 
is defined with respect to the z axis. The pseudorapidity is defined ast] = — ln[tan(0/2)]. 



3 Razor variables 

Although the signal considered in this analysis consists of two high pj h jets and ^j, additional 
jets may be produced by initial- or final-state radiation (ISR/FSR). We study the effect of such 
radiation with Monte Carlo (MC) simulation samples. To reduce the systematic uncertainty 
due to the imperfect simulation of ISR/FSR, we force every event into a dijet topology by 
combining all the jets in the event into two "pseudojets", following the "razor" methodology 
and variables |T7l[T8l . The pseudojets are constructed as a sum of the four-momenta of their 
constituent jets. After considering all possible partitions of the jets into two pseudojets, the 
combination that minimizes the sum in quadrature of the pseudojet masses is selected. 

The razor methodology provides an inclusive technique to search for production of heavy par- 
ticles, each decaying to a visible system of particles and a weakly interacting particle. As an 
example, let us consider the pair production of two massive particles, denoted S, each decaying 
to a b quark and neutral weakly interacting particle, ;t/ as S — >• hx- In the respective rest frame 
of each particle S, the decay products have a unique momentum p resulting from the two-body 
decay of S, given by: 



Ml -Ml 

where the mass of the b quark is neglected in this expression. This characteristic momentum, 
which is denoted Ma and is referred to as "momentum scale", is the same in each decay in- 
stance, and can be used to distinguish this particular signal from SM backgrounds in the same 
final states. The razor mass, Mr, is an event-by-event estimator of this scale calculated through 
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a series of approximations, motivated by physics, meant to estimate the rest frames of the re- 
spective particles S |(l7l[l8| , and is defined as: 



Mr ^ VdP'l + Ip'D' - (Pz + pi)' ~ 2Ma, (2) 

where p' (p^) is the absolute value (the longitudinal component) of the i-th pseudojet momen- 
tum. An average transverse mass Mj can be defined as: 



whose maximum value for signal events equals Ma- The dimensionless variable R is then 
defined as: 



For the signatures examined in this analysis, the value of Mr can have different interpretations. 
In the case of LQ3 pair production, the LQ3 corresponds to the particle S from the above ex- 
ample, while ;^ is a neutrino. As a result, the characteristic scale Ma is an estimator of the LQ3 
mass. Similarly, for bi pair production, S refers to a bi while x is the LSP, generally a massive 
neutralino. In this case. Ma corresponds to the mass difference between the bi and LSP. 

As follows from the definitions above, Mj is expected to have a kinematic endpoint at the 
mass of the new heavy particle, in a similar fashion to the transverse mass having an edge at 
the particle mass (such as Mj in W — > £v events). Therefore, the R variable is a measure of how 
well the missing transverse momentum is aligned with respect to the visible momentum. If the 
missing momentum is completely back-to-back to the visible momentum, R will be close to one. 
On the other hand, if the momenta of the two neutrinos or ^ largely cancel each other, R will 
be small. The distribution of R for signal events will peak around 0.5, while for QCD multijet 
events it peaks at zero. These properties of R and Mr motivate the kinematic requirements for 
the signal selection and background reduction, which are discussed below. 

Some differences between the kinematic distributions (such as the transverse momenta of b 
jets) for LQ3 production and bi production may arise, if the mass of the is substantial or 
even almost degenerate with the mass of the bi. For a fixed bi mass the Ma decreases as the 
^ mass increases. In the case of an almost degenerate ^ and bi, £t is relatively small and the 
jets are soft, resulting in an Mr distribution shifted towards lower values, thus reducing the 
momentum of the bi decays products and the sensitivity of the search. 



4 Data samples, triggers, and event selection 

The analysis is designed using MC samples generated with PYTHIA (version 6.424) fT9] and 
MadGraph [20] (version 5.1.1.0), and processed with a detailed simulation of the CMS detec- 
tor response based on Geant4 [21J. Events with QCD multijets, top quarks, and electroweak 
bosons are generated with MadGraph interfaced with pythia tune Z2 II22I for parton show- 
ering, hadronization, and the underlying event description. Signal samples for LQ3 masses 
from 200 to 650 GeV, in steps of 50GeV, are generated with pythia tune D6T ||23H24J. The bi 
pair production signal samples are generated with the pythia generator and processed with a 
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4 Data samples, triggers, and event selection 



detailed fast simulation of the CMS detector response Il25]| . The scalar bottom quark signal sam- 
ples are generated with bi masses from 100 GeV to 550 GeV in steps of 25 GeV, and ^ masses 
from 50 GeV to 500 GeV in steps of 25 GeV. The bi samples are generated with the assumption 
that the mass peak can be described by a Breit-Wigner shape [19], but this assumption becomes 
imprecise when the sparticles are close to degenerate. Samples where the difference between 
the bi mass and ^ mass is less than 50 GeV are therefore not generated. The simulated events 
are reweighted so that the distribution of number of overlapping pp interactions per beam 
crossing ("pileup") in the simulation matches that observed in data. 

Events used in this search are collected by a set of online triggers. The first level (LI) of the CMS 
trigger system, composed of custom hardware processors, uses information from the calorime- 
ters and muon detectors to select the most interesting events in a fixed time interval of less 
than 4 jis. The High Level Trigger (HLT) processor farm further decreases the event rate from 
around 100 kHz to around 300 Hz, before data storage. We employ three categories of triggers 
for this search: (i) hadronic razor triggers with moderate /tight requirements on R and Mr; 
(ii) muon razor triggers with looser requirements on R and Mr and at least one muon in the 
central part of the detector with pj > 10 GeV; and (iii) electron razor triggers with the R and 
Mr requirements similar to those for muon razor triggers, and at least one electron of pj > 
10 GeV, satisfying loose isolation criteria. Events collected with the muon and electron razor 
triggers are used to provide control regions for background studies, since the potential signal 
contribution in these events is negligible. The search for the presence of a new physics signal 
is performed in the events collected with the hadronic razor triggers. 

All events are required to have at least one good reconstructed interaction vertex fl6]. Events 
containing calorimeter noise, or large due to instrumental effects (such as beam halo or jets 
near non-functioning channels in the ECAL) are removed from the analysis Il27l . The jets in 
the event, which are required to have < 3.0, are reconstructed from the calorimeter energy 
deposits using the infrared-safe anti-A:x algorithm [28 1 with a distance parameter of 0.5, and are 
corrected for the non-uniformity of the calorimeter response in energy and t] using corrections 
derived from Monte Carlo and observed data II29I . The £t is reconstructed using the particle- 
flow algorithm, which identifies and reconstructs individually the particles produced in the 
collision, namely charged hadrons, photons, neutral hadrons, electrons, and muons [|30|| . 

4.1 Muon and electron identification and selection 

We select muon and electron candidates using a cut-based approach similar to the selection 
process used for the measurement of the inclusive W and Z cross section ll3T1l . 

We use the "tight" and "loose" muon identification criteria, and all muons are required to have 
Pj > 20 GeV. For loose muons, we require that the muon candidate has at least 10 hits in the 
inner tracker. For the tight muon we require in addition that the following selections are met: 

• at least one hit in the pixel detector; 

• impact parameter in the transverse plane I do I < 0.2 cm; 

• [t/I < 2.4. 

In addition, the tight muons satisfy a lepton isolation requirement 4omb obtained by sum- 
ming the Pj of tracks and the energies of calorimetric energy deposits in a cone of AR = 
^ (A?/)2 + (A^)2 < 0.3 around the lepton candidate, excluding the candidate's pj. We require 
the combined isolation to be less than 15% of the muon pj. 

The selection requirements for prompt electrons are: 



4.2 Identification of b jets 
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• Pt > 20GeVand |//| < 2.5; 

• combined isolation /comb < 15% of electron pj; 

• standard electron identification for barrel (endcap) electrons, defined as follows: 

• shape compatible with that of an electron, defined by a measure of the sec- 
ond moment of energy distribution among crystals (T,^ri < 0.012 (0.031) ll3T| : 

• track-cluster matching in the ^-direction, A(p < 0.8 (0.7); 

• track-cluster matching in the //-direction. At] < 0.007 (0.011). 

When the isolation requirements pTI are applied to the electron or tight muon candidates, the 
combined isolation /comb is corrected for pileup dependence using the average energy density 
p from other proton-proton collisions in the same beam crossing, calculated for each event II32I . 

4.2 Identification of b jets 

Jets originating from a b quark are identified ("tagged") by the TCHE algorithm [33J. Selecting 
events with b-tagged jets reduces the background from QCD multijet events where mismea- 
sured light-flavor jets cause large apparent ^j. In the TCHE algorithm a jet is considered as b 
tagged if there are at least two high-quality tracks within the jet, each with a three-dimensional 
impact parameter (IP) significance IP/ (7ip larger than a given threshold ("operating point"). 
In this analysis we use the "medium" operating point Il33ll . The b-tagging efficiency (e^,) and 
mistag rate {R\y) have been measured up to pj = 670 GeV and in the pj range 80-120 GeV are 
found to be eb = 0.69 ± 0.01 and R^, = 0.0286 ± 0.0003. In the following we refer to the sample 
with two jets tagged by the medium TCHE tagger as the "2b-tagged" sample. A scale fac- 
tor (per jet) of 0.95 ± 0.02 is applied to the to the MC simulation samples to account for the 
observed differences in the b-tagging efficiency between the simulation and data [33 J. 



5 Search strategy 

Candidate signal events in this search contain a pair of b jets, large ^j, and no isolated leptons. 
The main backgrounds that contribute to this final state originate from tt+jets, HE multijets, 
and W/Z+HE jets events. Diboson production is included in the total background estimation, 
but its contribution is small. Significant £t in multijet events derives from b quarks decaying 
semileptonically or from jet energies being severely mismeasured. Apart from the multijet 
background, the remaining backgrounds originate from processes with both genuine £t due to 
energetic neutrinos and undetected charged leptons from vector boson decays. 

Data sets collected with the razor triggers are examined for the presence of a well-identified 



electron or muon, as described in Section 4.1 Based on the presence or absence of such a 



lepton, the event is categorized into one of the three disjoint event samples (boxes) referred to 
as the electron (ELE), muon (MU), and hadronic (HAD) boxes. 

These requirements define the inclusive baseline selection: 

• MU box: events collected with muon razor triggers and containing one loose muon 
with pt > 20 GeV, Mr > 400 GeV and > 0.14. 

• ELE box: events collected with electron razor triggers and containing one loose elec- 
tron with pt > 20 GeV, Mr > 400 GeV and R^ > 0.14. 

• HAD box: events collected with hadronic razor triggers and not satisfying any other 
box requirements, and with Mr > 400 GeV and R'^ > 0.2. 



We also require that there are at least two jets above 60 GeV in each event, to ensure that the 
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6 Background estimation 



trigger is fully efficient for our selected events. In order to study and estimate the background 
contributions in the HAD box, we treat muons and electrons in the MU and ELE boxes as 
neutrinos, i.e. the lepton 4-vector is used to recalculate the £t vector and the R variable is 
recomputed. This procedure generates the kinematic properties of the background events in 
the HAD box, using events from the MU and ELE boxes that, because of the presence of the 
leptons, are free of the signals relevant to this analysis. 

The distributions of the discriminating variables R and Mr for the main backgrounds (heavy- 
flavor multijets and tt) are estimated from observed data. Events in the MU box are used to 
extract the probability density functions (PDFs) describing the behavior of the R and Mr shapes 
for each process of interest. For the W/Z+HF-jets and diboson backgrounds we use heavy- 
flavor-enriched MadGraph simulation samples to get the shape prediction. The procedure 
to extract the background shapes is described in detail in Section |6} and the samples used are 
summarized in Table [H 

To predict the SM background normalizations in the signal region we adopt the following strat- 
egy. The events in the ELE and HAD boxes are split into two exclusive categories: 

• sideband: events with 400 < Mr < 600 GeV and 0.2 < R^ < 0.25; 

• high R^: events with Mr > 400 GeV and R^ > 0.25. 

The 2b-tagged high-R^ events in the HAD box define the signal search region. The normaliza- 
tions of the SM backgrounds in the signal region are obtained through a two-step procedure: 

• the SM processes are normalized according to their theoretical cross sections, except 
for tt where the measured CMS cross section Il34ll is used; 



• the total background prediction in the high-R^ region is multiplied by a scale factor 
(/r2) to correct for imperfect knowledge of the multijet production cross section. 

The scale factor is derived from events in the sideband, and is defined as /r2 = Nexp / Nobs/ 
where Nexp is obtained using the background PDF normalized to their individual cross sections; 
and Nobs is the number of observed events. 

In order to avoid potential bias in the search, before analyzing the events in the HAD box 
signal region, we test our understanding of the SM background estimation procedure in control 
regions, using the MU and ELE boxes. This is done by comparing the background shapes 
derived from the MU box to the observed data in the ELE box (removing the leptons from the 
reconstruction to emulate £t in each case). To ensure that both the shapes and normalizations 
of the background components describe the observed events, the procedure to be used in the 
HAD box (see Table [ijbelow) is first employed and tested in the ELE box (Sec. 6.5 1. Events in the 



ELE sideband are used to obtain the scale factor /r2 which is used to test the background 
prediction in high R^ ELE box. Once the procedure is validated in the ELE box, the /r2 had is 
derived from events in the sideband of the HAD box, and is used to predict the normalization 
of the backgrounds in the signal region. 



6 Background estimation 

In both simulation and observed data, the distributions of SM background events have been 
shown to have a simple exponential dependence on the razor variables R and Mr over a large 
fraction of the R^-Mr plane Iil7nl8i . The shape of the Mr tail is well-described by two expo- 
nentials with slope parameters S, (z = 1, 2), where each Sj depends linearly on the R^ selection 
flireshold (R^): S; = A,- + x R^. 



6.1 The W/Z+jets background 
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Table 1: Summary of samples used in the search, with a short description of their specific 
purpose. Events in all samples are required to have Mr > 400 GeV and to include two b- 
tagged jets. The selections on listed in the table are applied after recalculating and R 
for events in which charged leptons are treated as neutrinos. The definitions of muons (fi) and 
electrons (e) are discussed in Section [ 



4.1 



Sample 


R^ cut 


Leptons 


Comment 


W/ZMC 


R^ > 0.07 


tight }i 


shape of W/Z+HF jets 


MU 


R2 > 0.14 


tight ^ 


shape of tt+jets 


MU 


K2 > 0.14 


loose fi 


shape of HE multijets 


ELE 


0.2 <R^ < 0.25 


tight e 


Mr < 600, sideband to extract /r2 ele 


ELE 


R2 > 0.25 


tight e 


ELE "signal-like" control region 


HAD 


0.2 < R2 < 0.25 


veto leptons 


Mr < 600, sideband to extract /r2 had 


HAD 


R2 > 0.25 


veto leptons 


signal box, search for signal 



We construct a simultaneous fit across different R bins, where the Mr distribution is fitted for 
each value of the R^ threshold to extract the A, and B, parameters. The simultaneous fit allows 
one to fully exploit the correlations between the fit parameters and therefore (i) to get a better 
estimate on the uncertainty of the A, and B, parameters, and (//) to ensure that the PDF obtained 
from the fit can be used in regions with various R^ thresholds. The functional form used in the 
fit for a fixed value of the R threshold is: 

where /, the relative amplitude of the second exponent, is extracted from the fit. The values 
of the shape parameters that maximize the likelihood in the fits, along with the corresponding 
covariance matrix, are used to define the background model and the uncertainty associated 
with it. Therefore, if a pure sample of a given process is selected, the PDF describing the 
behavior of the R and Mr shapes of a given process can be extracted. 

The fits are performed using the RooFiT toolkit [35 J. The background PDFs are then used to 
generate pseudoexperiments, to evaluate the effects of systematic uncertainties on the event 
yields, as described below in Section [6!4| 

6.1 The W/Z+jets background 

Owing to the lack of a high-purity data sample enriched in events with W/ Z+two heavy-flavor 
jets, we estimate the shape of the W/ Z+jets background using MC simulated events. A se- 
lection of events in the observed data whose jets fail to be b-tagged could provide a sample 
enriched in W+light flavor jets. However, because of the b-tagging efficiency on the jet pj II33I , 
the PDF extracted from these events does not provide a sufficiently accurate model for W/Z+b 
jets events. Therefore, we estimate the shape of the W/Z+jets background using simulated 
events generated with the MadGraph event generator interfaced with pythia, which were 
found to give an adequate description of CMS observed data Il36ll37| . Residual deficiencies of 
this MC simulation-based background modeling are accounted for in the extraction of the tt 



background estimate from observed data, as described in the Section 6.2 The overall normal- 
ization of this background is determined using the observed events in the sideband region of 
the HAD box. 



We perform an unbinned fit of the W/Z+jets Mr distribution in simulated events passing the 
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6 Background estimation 



MU box selections with 2b-tagged events, using the sum of two separate exponential terms, as 
shown in Eq. (|5|. The fit allows us to obtain a parametric description of the background that is 
later used in the derivation of the remaining backgrounds, and it also permits the extrapolation 
of the prediction into the region of higher R and Mr values. The fit is performed in the region 
Mr > 400 GeV and is binned in values of as shown in Fig. [l] The fit to the simulated data, 
which provides a good description of the Mr distribution, is used as the PDF to estimate the 
W/Z+b jets background in the signal box. 



CMS Simulation ^ls = 7 TeV IVIU Box 
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Figure 1: Mr distributions for different values of the threshold for events passing the MU 
box selections in the W/Z+jets MC simulation. The results of the fits (lines) are overlaid with 
the Mr distributions from the MC simulation (markers). 



6.2 tt+jets background estimation 

We estimate the tt background from the MU box, using 2b-tagged events in collision data (Sec- 
tion |4.2[ ) and requiring the presence of a muon passing the tight identification requirements 
( Section |4.1| . Based on comparisons with the MC simulation, approximately 90% of the events 
in this sample are tt. We find empirically from MC simulation studies that the shape of the Mr 
distribution in both the tightly selected MU box and in the HAD box is very similar, as can be 
seen in Fig. |2j We therefore use the shape derived from the 2b-tagged sample to predict the 
tt background in the signal region. Additionally, because of a non-negligible contribution of 
W/Z+HF events in this sample, the imperfections in the W/Z + jets background modeling in 
the simulation are absorbed into the tf background prediction. In order to derive the tt shape. 



we constrain the W/Z + jets shape to that obtained from the MC simulation (Section 6.1 1. We 
find that a two-exponential function provides a good fit to the observed data in the MU box, as 
shown in Fig.|3] 

6.3 Multijet bacltground 

The remaining backgrounds that contribute significantly to the interesting region of high R^ 
originate from heavy-flavor enriched multijet production. We use events with a loose muon 
in the MU box to derive the multijets background PDF. According to the MC simulation, this 
sample is composed 45% of top events, 5% of W/ Z+b jet events, and 50% of multijet events. 



6.3 Multijet background 
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Figure 2: The Mr distributions (left) in tt MC simulated events selected with either tight MU, 
tight ELE and HAD requirements, and (right) the ratio of the number of events selected with 
the HAD or tight MU selections, as a function of Mr. 
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Figure 3: The result of the fit of the Mr distributions (lines) compared to MU box observed 
events with > 0.14 (left); individual background contributions are not stacked. On the right 
are shown the Mr distributions for different values of the threshold (right) in 2b-tagged 
events of the MU box with a tight muon; the results of the fits (lines) are overlaid with the 
observed distributions (markers). 



We proceed to perform the fits, for which the contributions from W/Z+b jets and tt back- 
grounds are fixed to the PDFs described in Sections 6.1 and 6.2 Based on simulation studies it 
is found that the parameters of the second component in the fit function (A2 and B2 in Eq. (j5|) 
are nearly idenical for the multijet and the tt+jets background processes. In order to better con- 
strain the multijet fit, the parameters of the second component are set equal to those from the 
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observed events for tt+jets while the parameters of the first component of the multijet PDF are 
left free. The results of the fit in the 2b-tagged MU box are displayed in Fig. |4| where we find 
good agreement between the fit results and observed data. 



CMS 4.8 fb"^ at is = 7 TeV 



MU Box 



CMS 4.8 fb"^ aH^ = 7 TeV 



MU Box 




' ',' I ' ' ' I ,' ', ' I ' ' '■ 

Observed data 
Total background^ 
Multijets 

tt + jets 
W/Z + jets 



400 600 800 1000 



1200 1400 



I I I I i1 
1600 1800 2000 

Mr [GeV] 




400 600 800 1000 1200 



1400 1600 1800 2000 

Mr [GeV] 



Figure 4: The result of the fit of the Mr distributions (lines) compared to the MU box observed 
data for events with > 0.14 (left); individual contributions of backgrounds are not stacked. 
On the right are shown the Mr distributions for different values of the threshold (right) in 
2b-tagged events of the MU box with a loose muon; the results of the fits (lines) are overlaid 
with the observed distributions (markers). 



6.4 Systematic uncertainties 

For the backgrounds estimated from observed events, the uncertainty in the total yield arises 
from the uncertainties (statistical and systematic) in the fit parameters in Eq. (|5]). We estimate 
these uncertainties by varying the R^ threshold values (by ±5%), thus arriving at a new set 
of Aj and B, parameters describing the background PDF. The maximum difference observed 
between the experimental data and the simulated data in the MU box with tight and loose 
muon selections is then used as the uncertainty on the shape parameters. This procedure results 
in a 10% uncertainty in the A, values, and 40% in the Bj values. We also tested the stability of the 
fits by varying the initial parameters used to start the fit by ±50%, and found that this variation 
results in stable solutions, returning the same central value for the A, and B, parameters. 

We generate an ensemble of pseudoexperiments, based on the fit results in the MU box. From 
each pseudoexperiment a new set of values for the parameters is then obtained, with the corre- 
sponding uncertainties, and we use the associated PDF results to predict the background yield. 
The ensemble of pseudoexperiments thus provides a distribution of the expected background 
yield in the signal regions, with its corresponding uncertainty. This procedure allows us to 
correctly propagate the systematic uncertainty in the background shape into the prediction of 
the background. To account for the normalization uncertainty we propagate the uncertainty in 
the /r2 introduced in Section|5]to the prediction of background yields in the signal region from 
control samples in observed events. 

The effect of the jet energy scale (JES) and jet energy resolution (JER) uncertainties on the 
W/Z+jets background estimate and the signal model yields from simulation are taken into 
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account. These effects are evaluated by repeating the extraction of all background PDFs by 
first varying the JES/JER by plus or minus one standard deviation in the W/ Z+jet background 
model, and recalculating the £t and R. These variations correspond to uncertainties as large 
as 3% in the selection efficiency. We then re-derive the background model PDFs from observed 
data in the MU box, using the newly obtained W/ Z+HF jets model. The new set of PDFs with 
their corresponding covariance matrices then serve as an alternative background model. 

We apply a scale factor of about 0.95, that is weakly dependent on jet pj, to account for an 
observed difference in tagging efficiency between data and simulation. The uncertainty in the 
scale factor varies from 0.03 to 0.05 for jets with pj from 30 to 670 GeV, and is 0.10 for b jets with 
Pj > 670 GeV. These uncertainties are measured using a dijet sample with high b-jet purity, as 
detailed in Ref. Il33ll. 



The uncertainty in the bi acceptance due to uncertainties in the parton distribution functions 
is calculated using the recommendation from the PDF4LHC group [38]. The parton distri- 
bution function and the a.s variations of next-to-leading (NLO) order in the MSTW2008 [39^, 
CTEQ6.6 [40J, and NNPDF2.0 [41J sets were taken into account and their impact on the signal 
cross sections was compared with the calculation with CTEQ6L1 [42] that was used in the sim- 
ulation of the signal samples. From these three sets we evaluate an upper and lower bound 
on the signal efficiency for each pair of assumed bi and masses, and half of the difference 
between the two bounds is used as an estimate of the uncertainty. The theoretical cross section 
of LQ3 production has been calculated using CTEQ6L1 and CTEQ6M |j42l at NLO, and the un- 
certainty in the prediction of the cross section was estimated by repeating the calculation using 
the NLO MRST2002 parametrization [43]. This uncertainty was found to vary from 3.5 to 25% 
for leptoquarks in the mass range considered in this analysis |44|. 

The systematic uncertainty to the luminosity measurement is taken to be 2.2% [45J, which is 
correlated among all signal channels and the background estimates that are derived from sim- 
ulations. The uncertainty in trigger efficiency is estimated using a set of prescaled razor triggers 
with low thresholds, and is found to be 2% for events in the HAD box, and 3% for events in the 
MU and ELE boxes. 

6.5 ELE control region 

In order to check that our background shape modeling indeed predicts the observed data ade- 
quately, we use the PDFs obtained in the steps described above (Sections 6.1|[6.3| l in an orthog- 



onal sample in the 2b-tagged ELE box with a tight electron selection, i.e. the sample with a 
well-identified electron, which is then treated as a neutrino. This signal-depleted sample pro- 
vides an independent cross-check of our background modeling, and covers the same region in 
R and Mr as the HAD box. Additionally, based on MC simulation studies, the composition 
of the tight ELE sample in observed events is similar to that of the HAD sample, consisting of 
approximately 85% tt, 5% W/Z+HF jets, and 10% multijet events. For comparison, the HAD 
sample is expected to contain approximately 70%, 5%, and 25% of the respective backgrounds. 

Using the background model PDFs obtained from the fits, we derive the distribution of the 
expected shapes in the ELE box using pseudoexperiments. In order to correctly account for 
correlations and uncertainties in the parameters describing the background model, the shape 
parameters used to generate each pseudoexperiment data set are sampled from the covariance 
matrix returned by the fit. The actual number of events in each dataset is then drawn from 
a Poisson distribution centered on the yield returned by the covariance-matrix sampling. For 
each pseudoexperiment dataset, the number of events in the sideband and in the high-R^ region 
is found. We then obtain the scale factor /r2 ^le = 0-^7 ± 0.14 from the sideband region, which 



12 



6 Background estimation 



is used to predict the overall yield of background events in the high region of the ELE box. 

The comparison of the predicted Mr distribution with the observed events in the ELE box is 
shown in Figure|5} and the background model is found to predict the observed data adequately. 
We also test our ability to correctly predict the yields of SM backgrounds using the scale factor 
mentioned above. The results are summarized in Table |2] Total background yield in the side- 
band is normalized to the number of observed data events in the sideband, in order to derive 



the scale factor /r2 elE' described in Section 
shown here represent systematic uncertainties t 



5] The uncertainties in the background yields 
lat are estimated by varying the parameters A, 
and Bj, as described in Section 6.4 As can be seen in this comparison, the /r2 ^le obtained from 
the sideband allows one to predict the overall normalization of the 2b-tagged sample. 

Table 2: Comparison of the yields in the ELE box. The sideband here refers to 2b-tagged events 
in the ELE box with 400 < Mr < 600 GeV and 0.2 < < 0.25, while "signal-like" refers 
2b-tagged events with Mr > 400 GeV and R > 0.25. The scale factor derived in the sideband 
(/r2 ELE = 0-87 ± 0.14) is used to normalize the background yield in the signal-like region (third 
column), and the uncertainty on the /r2 ele is propagated into the total background yield. 

Sideband Signal-like 



Multijets 
W/Z+jets 
tt+jets 
Other backgrounds 


12.5 ±1.9 10 ±11 
3.6 ±1.9 8.8 ±2.8 
58.8 ± 7.7 118.4 ± 9.8 
0±0 0.6 ±1.0 


/r2,ele 


0.87 ±0.14 


Total background 


65 ±13 119 ±23 


Observed data 


65 121 
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Figure 5: The Mr distribution for observed data in the 2b-tagged ELE box for events with 
> 0.25 compared to the prediction. The background model derived from the MU box is 
used to predict the Mr shapes of the background processes. The individual contributions are 
not stacked. 



We perform another check to test whether the R^-dependence is well-described by our back- 
ground model. This check is needed since in the final signal region we have several signal 
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boxes, each optimized for different signal mass hypotheses. In order to increase the sensitivity 
for higher masses, a tighter selection on is imposed to reduce the backgrounds further, while 
keeping the signal efficiency high. In order to ensure that our background model adequately 
describes observed data with higher thresholds, we perform the same procedure in the ELE 
box. The results are summarized in Table |3] Here, we use the same /r2 derived from the 
sideband. As can be seen from these results, this model correctly predicts the total yields for 
higher R^ boxes. 

Table 3: Expected and observed yields in the 2b-tagged ELE box for R^ selections and a fixed re- 
quirement Mr > 400 GeV. The quoted uncertainties on the expected number of events include 
statistical and systematic uncertainties, and the uncertainty on the scale factor /r2 ^l^. 



R^ Cut 


Expected yields 


Observed yields 


>0.25 


119 ±23 


121 


0.25-0.30 


51 ±17 


48 


0.30-0.35 


30 ±10 


26 


0.35-0.38 


9.9 ±5.2 


11 


0.38-0.42 


11.5 ±5.0 


11 


>0.42 


16.8 ±4.8 


25 



7 Results 

We search for LQ3 and bi signals in the HAD box data sample using the background PDFs 
obtained from the MU box (Sections 6.1|[6.3 |. The predicted background yields and their un- 



certainties are summarized in Table |4j Total background yield in the 2b-tagged sideband is 
normalized to the number of observed data events in the sideband, in order to derive the scale 
factor /r2 had = 1-10 ^ 0.13, as described in SectionlSl The distributions of R and Mr observed 



in the 2b-tagged HAD box are compared to the background prediction in Fig.|6] 

Table 4: Comparison of the yields in the 2b-tagged (signal region) samples in the HAD box. 
The uncertainties include the systematic uncertainty in the background shapes (Section |6.4[ | 
and statistical uncertainties. The uncertainty in the total yield after scaling also includes the jet 
energy scale uncertainty. The scale factor derived in the sideband (/rz ^ad = 1-10 ^ 0.13) is 
used to normalize the background yield in the signal-like region. The uncertainty in /r2 had is 
propagated and included in the quoted uncertainty in the expected background yields. 





Sideband 


Signal-like 


Multijets 


81.0 ±9.5 


34.5 ± 6.5 


W/Z+jets 


4.5 ± 2.2 


11.4 ±3.5 


tt+jets 


68.7 ±9.6 


140 ± 11 


Other backgrounds 


0.08 ± 0.29 


0.16 ± 0.48 


/r2,had 


1.10 ±0.13 


Total background 


170 ± 25 


205 ± 28 


Observed data 


170 


200 



As seen in Fig. |6] and Table |4| both the number of observed events and the shapes of the R and 
Mr distributions are in agreement with the expected SM backgrounds. Therefore, we proceed 
to define two signal regions, to enhance the sensitivity for different LQ3 masses. The regions 
are optimized to provide the lowest expected cross section limits, by varying the thresholds on 
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Figure 6: Comparison of the background prediction with the data observed in the 2b-tagged 
sample in the HAD signal box for the Mr (left) and R (right) distributions. The expected con- 
tributions from LQ3 and bi signal events with various mass hypotheses are also shown. 



R and Mr. We find that Mr > 400 GeV provides the best sensitivity for all masses, and for LQ3 
masses below 350 GeV the optimal selection is R^ > 0.25, while for higher masses R^ > 0.42 
provides best sensitivity. Because of the high value assumed for the ^ mass in the bi search, the 
inclusive selection of Mr > 400 GeV and R^ > 0.25 is found to provide the optimal sensitivity 
in the mass range considered in this analysis. 

Table |5] shows the comparison of the expected background yields in these signal boxes, and 
agreement of the observed event counts with the expectations is observed. Table |6] shows the 
efficiency of these selections for several LQ3 mass hypotheses, based on MC simulation. Effi- 
ciencies for the bi signal are shown in Fig. [zj Typical efficiencies range from a few percent up 
to ~12 percent for bi masses between 200 and 500 GeV and small ^ mass. The efficiency drops 
when the mass of the bi squark is close to the mass of x^, since the resulting b jets are softer in 
these scenarios. 

Table 5: Expected and observed yields in the 2b-tagged HAD box for various R^ selections and 
a fixed Mr > 400 GeV requirement. The quoted uncertainties on the expected number of events 
include statistical and systematic uncertainties, and the uncertainty from the /r2 had- 

The left 

three columns show inclusive yields above the R^ threshold, while the right three columns 
show the yields in bins of R^. 



Cut 


Expected yields 


Observed yields 


bins 


Expected yields 


Observed yields 


>0.25 


205 ± 28 


200 


0.25-0.30 


105 ± 25 


97 


>0.30 


100 ± 16 


103 


0.30-0.35 


44 ±11 


49 


>0.35 


56 ±12 


54 


0.35-0.38 


13 ±9 


14 


>0.38 


43 ±9 


40 


0.38-0.42 


18 ±6 


13 


>0.42 


25 ±7 


27 


>0.42 


25 ±7 


27 



The statistical model for the observed number of events is a Poisson distribution with the ex- 
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Table 6: Summary of the expected LQ3 signal yields and efficiency in the signal region, for 
4.7 fb^^ of observed data, in events with Mr > 400 GeV. For LQ3 masses 

below 350 GeVR2 > 

0.25 is required, while for heavier masses we require events to pass > 0.42. All uncertainties 
are statistical only. 



Mlq3 [GeV] 


aiph) 


Efficiency (%) 


Number 
of expected events 


200 


12 


0.33 


185 ± 13 


250 


3.5 


1.1 


171.2 ±9.1 


280 


1.8 


1.8 


151.4 ±3.6 


320 


0.82 


3.2 


122.8 ± 1.9 


350 


0.48 


1.8 


39.2 ± 1.3 


450 


0.095 


4.3 


19.17 ±0.38 


550 


0.024 


5.9 


6.59 ±0.12 



CMS Simulation \^ = 7TeV 




Figure 7: Signal efficiency for simulated bi signal events with Mr > 400 GeV and R'^ > 0.25. 
White lines show the iso-efficiency contours for 1, 5, and 10% signal efficiency, respectively. 

pected value equal to the sum of the signal and expected backgrounds. Log-normal priors for 
the nuisance parameters are used to model the systematic uncertainties listed in Section [6!4| 

A 95% CL upper limit is set on the potential signal cross section, as summarized in Table |7| 
The modified frequentist construction CLg Il46ll47 ] is used for limit calculation. These limits are 
interpreted in terms of limits of LQ3 pair production cross section as shown in Fig.|8] The upper 
limits are compared to the NLO prediction of the LQ pair production cross section [44J, and 
we set a 95% CL exclusion on LQ masses smaller than 440 GeV (expected 470 GeV), assuming 
j6 = 0. We also present the 95% CL limit on /5 as a function of LQ3 mass as shown on the right 
side of Fig. |8] 

The results of the analysis are interpreted in the context of the simplified supersymmetry model 
spectra (SMS) [48 - 50 J. In SMS, a limited set of hypothetical particles and decay chains are intro- 
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7 Results 



Table 7: Observed and expected 95% CL upper limits on the LQ3 pair-production cross section 
as a function of the LQ3 mass. 



Mlq3 [GeV] 


-la 


-1(7 


Median 
expected limit [pb] 


+ 1(7 


+2(7 


Observed 
limit [pb] 


200 


2.0 


3.3 


4.5 


6.2 


8.4 


4.3 


250 


0.64 


1.1 


1.4 


2.0 


2.6 


1.3 


270 


0.43 


0.75 


0.97 


1.4 


1.8 


0.90 


330 


0.18 


0.24 


0.33 


0.46 


0.62 


0.36 


350 


0.13 


0.17 


0.23 


0.32 


0.42 


0.25 


450 


0.047 


0.067 


0.092 


0.13 


0.17 


0.10 


550 


0.037 


0.049 


0.066 


0.094 


0.13 


0.073 
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Figure 8: (Left) the expected and observed upper limit at 95% CL on the LQ3 pair production 
cross section as a function of the LQ3 mass, assuming /3 = 0. The systematic uncertainties 



reported in Section 6.4 are included in the calculation. The vertical greyed region is excluded 
by the current DO limit [12] in the same channel. The theory curve and its band represent, 
respectively, the theoretical LQ3 pair production cross section and the uncertainties due to the 
choice of parton distribution functions and renormalization/ factorization scales [44]. (Right) 
minimum j6 for a 95% CL exclusion of the LQ3 hypothesis as a function of LQ3 mass. The 
observed (expected) exclusion curve is obtained using the observed (expected) upper limit and 
the central value of the theoretical LQ3 pair production cross section. The band around the 
observed exclusion curve is obtained by considering the observed upper limit while taking 
into account the uncertainties on the theoretical cross section. The grey region is excluded by 
the current DO limits lil2J in the same channel. 



duced to produce a given topological signature, such as the £t plus b jets final state considered 
in this analysis. We consider a SMS scenario where all supersymmetric particles are set to have 
a very large mass, except for the bi and The pairs of scalar bottom quarks produced through 
strong interactions are kinematically allowed to decay only into a b quark and a 

The observed and expected 95% CL upper limits in the bi — ^ mass plane are shown in 
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Figure 9: The expected and observed 95% CL exclusion limits for the bi pair production SMS 
model. The red dashed contour shows the 95% CL exclusion limits based on the NLO+NLL 
cross section. The red dotted contours represent the theoretical uncertainties from the variation 
of parton distribution functions, and renormalization and factorization scales. The correspond- 
ing expected limits are shown with the black dashed contour. The shaded yellow contours 



represent the uncertainties in the SM background estimates, as reported in Section 6.4 



Fig. |9| where the bi pair production cross section is calculated at the NLO and next-to-leading- 
logarithm (NLL) order ||5T] - |56| . Since Mr depends on the squared difference of the masses of 
bi and at the bi masses around 400-450 GeV and low masses the exclusion limit is al- 
most independent of the mass. The signal acceptance in the region with small mass splitting 
between the bi and is particularly susceptible to uncertainties associated with initial-state 
radiation (ISR). The impact of ISR is estimated by comparing the results of the acceptance cal- 
culation using PYTHIA with the "power shower" and with moderate ISR settings |[T9| . If the 
acceptance varies by more than 25% for a particular choice of bi and ^ masses, then no limit 
is set for those mass parameters. This procedure results in reduced sensitivity in the region of 
m(hi) < 300 GeV and 80 < ?«(^) < 130 GeV, and thus an inability to exclude some of the 
models in this parameter range. 



8 Summary 

A search has been performed for third-generation scalar leptoquarks and for scalar bottom 
quarks in the all-hadronic channel with a signature of large ?t arid b-tagged jets. This search 
is based on a data sample collected in pp collisions at ^ys = 7 TeV and corresponding to an 
integrated luminosity of 4.7 fb^^. The number of observed events is in agreement with the 
predictions for the SM backgrounds. We set an upper limit on the LQ3 pair production cross 
section, excluding a scalar LQ3 with mass below 450 GeV, assuming a 100% branching fraction 
of the LQ3 to b quarks and tau neutrinos. We set 95% confidence level upper limits in the bi — ^ 
mass plane such that for neutralino masses of 50 GeV, scalar bottom masses up to 410 GeV are 
excluded. These results represent the most stringent limits on LQ3 masses and extend limits on 
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bi masses to much higher values than probed previously. 
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